24 research outputs found

    Optimal Estimation of Generic Dynamics by Path-Dependent Neural Jump ODEs

    Full text link
    This paper studies the problem of forecasting general stochastic processes using an extension of the Neural Jump ODE (NJ-ODE) framework. While NJ-ODE was the first framework to establish convergence guarantees for the prediction of irregularly observed time series, these results were limited to data stemming from It\^o-diffusions with complete observations, in particular Markov processes where all coordinates are observed simultaneously. In this work, we generalise these results to generic, possibly non-Markovian or discontinuous, stochastic processes with incomplete observations, by utilising the reconstruction properties of the signature transform. These theoretical results are supported by empirical studies, where it is shown that the path-dependent NJ-ODE outperforms the original NJ-ODE framework in the case of non-Markovian data. Moreover, we show that PD-NJ-ODE can be applied successfully to limit order book (LOB) data

    Estimating Full Lipschitz Constants of Deep Neural Networks

    Full text link
    We estimate the Lipschitz constants of the gradient of a deep neural network and the network itself with respect to the full set of parameters. We first develop estimates for a deep feed-forward densely connected network and then, in a more general framework, for all neural networks that can be represented as solutions of controlled ordinary differential equations, where time appears as continuous depth. These estimates can be used to set the step size of stochastic gradient descent methods, which is illustrated for one example method

    Extending Path-Dependent NJ-ODEs to Noisy Observations and a Dependent Observation Framework

    Full text link
    The Path-Dependent Neural Jump ODE (PD-NJ-ODE) is a model for predicting continuous-time stochastic processes with irregular and incomplete observations. In particular, the method learns optimal forecasts given irregularly sampled time series of incomplete past observations. So far the process itself and the coordinate-wise observation times were assumed to be independent and observations were assumed to be noiseless. In this work we discuss two extensions to lift these restrictions and provide theoretical guarantees as well as empirical examples for them

    Regret-Optimal Federated Transfer Learning for Kernel Regression with Applications in American Option Pricing

    Full text link
    We propose an optimal iterative scheme for federated transfer learning, where a central planner has access to datasets D1,…,DN{\cal D}_1,\dots,{\cal D}_N for the same learning model fθf_{\theta}. Our objective is to minimize the cumulative deviation of the generated parameters {θi(t)}t=0T\{\theta_i(t)\}_{t=0}^T across all TT iterations from the specialized parameters θ1⋆,…,θN⋆\theta^\star_{1},\ldots,\theta^\star_N obtained for each dataset, while respecting the loss function for the model fθ(T)f_{\theta(T)} produced by the algorithm upon halting. We only allow for continual communication between each of the specialized models (nodes/agents) and the central planner (server), at each iteration (round). For the case where the model fθf_{\theta} is a finite-rank kernel regression, we derive explicit updates for the regret-optimal algorithm. By leveraging symmetries within the regret-optimal algorithm, we further develop a nearly regret-optimal heuristic that runs with O(Np2)\mathcal{O}(Np^2) fewer elementary operations, where pp is the dimension of the parameter space. Additionally, we investigate the adversarial robustness of the regret-optimal algorithm showing that an adversary which perturbs qq training pairs by at-most ε>0\varepsilon>0, across all training sets, cannot reduce the regret-optimal algorithm's regret by more than O(εqNˉ1/2)\mathcal{O}(\varepsilon q \bar{N}^{1/2}), where Nˉ\bar{N} is the aggregate number of training pairs. To validate our theoretical findings, we conduct numerical experiments in the context of American option pricing, utilizing a randomly generated finite-rank kernel.Comment: 54 pages, 3 figure

    Denise: Deep Learning based Robust PCA for Positive Semidefinite Matrices

    Full text link
    The robust PCA of high-dimensional matrices plays an essential role when isolating key explanatory features. The currently available methods for performing such a low-rank plus sparse decomposition are matrix specific, meaning, the algorithm must re-run each time a new matrix should be decomposed. Since these algorithms are computationally expensive, it is preferable to learn and store a function that instantaneously performs this decomposition when evaluated. Therefore, we introduce Denise, a deep learning-based algorithm for robust PCA of symmetric positive semidefinite matrices, which learns precisely such a function. Theoretical guarantees that Denise's architecture can approximate the decomposition function, to arbitrary precision and with arbitrarily high probability, are obtained. The training scheme is also shown to convergence to a stationary point of the robust PCA's loss-function. We train Denise on a randomly generated dataset, and evaluate the performance of the DNN on synthetic and real-world covariance matrices. Denise achieves comparable results to several state-of-the-art algorithms in terms of decomposition quality, but as only one evaluation of the learned DNN is needed, Denise outperforms all existing algorithms in terms of computation time

    FlorianKrach/RegretOptimalFederatedTransferLearning: Initial Release with Paper

    No full text
    release of code to reproduce results from paper
    corecore